This paper presents a stream-oriented architecture for structuring clusterapplications. Clusters that run applications based on this architecture canscale to tenths of thousands of nodes with significantly less performance lossor reliability problems. Our architecture exploits the stream nature of thedata flow and reduces congestion through load balancing, hides latency behinddata pushes and transparently handles node failures. In our ongoing work, weare developing an implementation for this architecture and we are able to runsimple data mining applications on a cluster simulator.
展开▼